Siemens: Funtional Data Analysis Pipeline¶

Index

  1. Loading the datasets
  2. Preprocessing steps
    • 2.1. Data wrangling on time series
    • 2.2. Data wrangling on additional features
    • 2.3. Merging time series datasets to add additional features
      • 2.3.1. Removal of testID only exists in one sensor
  3. Window extraction
    • 3.1. Validating if there are partial or full missing values after the extraction
    • 3.2. Validating shape post-window extraction
    • 3.3. Scaling the post-window data: zero-alignment
    • 3.4. Merging scaled data with additional attributes of interest
    • 3.5. Balancing the specific attributes
    • 3.6. Windows visualization (balanced data)
  4. FPCA characterization
    • 4.1. Functional PC1 plots (both systems): Characterization of FPC Scores
    • 4.2 Linear Regression for slope
  5. Functional Regression
    • 5.1. Regression coefficients
    • 5.2. Coefficients visualization
In [ ]:
# Change directory
import os
os.chdir("../../../..")
In [ ]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import altair as alt
import random
import statsmodels.api as sm
from skfda.representation.grid import FDataGrid
from skfda.preprocessing.dim_reduction.projection import FPCA
from skfda.exploratory.visualization import FPCAPlot
from sklearn.preprocessing import OneHotEncoder
import skfda
from skfda.ml.regression import LinearRegression
from skfda.representation.basis import FDataBasis, FourierBasis
from skfda.exploratory.depth import IntegratedDepth, ModifiedBandDepth
from skfda.exploratory.visualization import Boxplot
# Import designed-functions
from helper_functions.window_extraction import Merge_data, process_sensor_data, align_to_zero, balance_index
from helper_functions.time_series_visualization import plot_all_time_series, plot_all_time_series_and_mean_fpca, plot_all_time_series_in_group
from helper_functions.functionalPCA import fpca_two_inputs, first_component_extraction, bootstrap, create_pc_scores_plots, visualize_regression
from helper_functions.functional_regression import Function_regression, coefficent_visualization
/var/folders/vh/dw36swbx2939r11_2dkm6r4m0000gn/T/ipykernel_67452/1250556825.py:9: DeprecationWarning: The module "projection" is deprecated. Please use "dim_reduction"
  from skfda.preprocessing.dim_reduction.projection import FPCA

1. Loading the datasets¶

The path of the files can be change based on where the data is stored.

In [ ]:
# Import datasets
sensorA_System1 = pd.read_csv("RawData/System1_SensorA.csv")
sensorA_System2 = pd.read_csv("RawData/System2_SensorA.csv")
sensorB_System1 = pd.read_csv("RawData/System1_SensorB.csv")
sensorB_System2 = pd.read_csv("RawData/System2_SensorB.csv")
sensorA_System1_missing = pd.read_csv("RawData/SensorA_System1_missing values.csv")
sensorA_System2_missing = pd.read_csv("RawData/SensorA_System2_missing values.csv")
keyByTestID = pd.read_csv("RawData/Key by TestID.csv", parse_dates=['DateTime'])

2. Preprocesing Steps¶

2.1. Data wrangling on time series¶

In [ ]:
# Transpose dataset to make columns as timestamps and rows as tests

# Sensor A
A1_transposed = sensorA_System1.T.reset_index()
A1_transposed.columns = A1_transposed.iloc[0]
A1_transposed.rename(columns={A1_transposed.columns[0]: 'TestID'}, inplace=True)
A1_transposed = A1_transposed.drop(0)
A1_transposed['TestID'] = A1_transposed['TestID'].astype(int)

A2_transposed = sensorA_System2.T.reset_index()
A2_transposed.columns = A2_transposed.iloc[0]
A2_transposed.rename(columns={A2_transposed.columns[0]: 'TestID'}, inplace=True)
A2_transposed = A2_transposed.drop(0)
A2_transposed['TestID'] = A2_transposed['TestID'].astype(int)

A1_missing_transposed = sensorA_System1_missing.T.reset_index()
A1_missing_transposed.columns = A1_missing_transposed.iloc[0]
A1_missing_transposed.rename(columns={A1_missing_transposed.columns[0]: 'TestID'}, inplace=True)
A1_missing_transposed = A1_missing_transposed.drop(0)
A1_missing_transposed['TestID'] = A1_missing_transposed['TestID'].astype(int)

A2_missing_transposed = sensorA_System2_missing.T.reset_index()
A2_missing_transposed.columns = A2_missing_transposed.iloc[0]
A2_missing_transposed.rename(columns={A2_missing_transposed.columns[0]: 'TestID'}, inplace=True)
A2_missing_transposed = A2_missing_transposed.drop(0)
A2_missing_transposed['TestID'] = A2_missing_transposed['TestID'].astype(int)

# Sensor B
B1_transposed = sensorB_System1.T.reset_index()
B1_transposed.columns = B1_transposed.iloc[0]
B1_transposed.rename(columns={B1_transposed.columns[0]: 'TestID'}, inplace=True)
B1_transposed = B1_transposed.drop(0)
B1_transposed['TestID'] = B1_transposed['TestID'].astype(int)

B2_transposed = sensorB_System2.T.reset_index()
B2_transposed.columns = B2_transposed.iloc[0]
B2_transposed.rename(columns={B2_transposed.columns[0]: 'TestID'}, inplace=True)
B2_transposed = B2_transposed.drop(0)
B2_transposed['TestID'] = B2_transposed['TestID'].astype(int)
In [ ]:
# Complete A1 and A2 with the missing values
A1_transposed_mid = A1_transposed[~A1_transposed.TestID.isin(A1_missing_transposed.TestID)]
A1_transposed = pd.concat([A1_transposed_mid, A1_missing_transposed], axis=0)
A2_transposed_mid = A2_transposed[~A2_transposed.TestID.isin(A2_missing_transposed.TestID)]
A2_transposed = pd.concat([A2_transposed_mid, A2_missing_transposed], axis=0)

2.2. Data wrangling on additional features¶

In [ ]:
# Relabeling System Values
keyByTestID["System"] = keyByTestID["System"].replace({"System 2A":"System 2","System 2B":"System 2"})

# Create new column to fill fluid temperature NA's
# Note: Fluid temperature: If specified, take as the temperature of the sample fluid. The rest of the system temperature can be taken as ambient temperature.
keyByTestID['Fluid_Temperature_Filled'] = keyByTestID['Fluid Temperature'].combine_first(keyByTestID['AmbientTemperature'])

# Binning 

# Categorize 'FluidType' into Blood and Aqueous
keyByTestID['FluidTypeBin'] = np.where(keyByTestID['FluidType'].str.startswith('Eurotrol'), 'Aqueous', 'Blood')

# Categorize 'AgeOfCardInDaysAtTimeOfTest' into bins
keyByTestID["CardAgeBin"] = pd.cut(keyByTestID["AgeOfCardInDaysAtTimeOfTest"], bins=[0, 9, 28, 56, 84, 112, 140, 168, 196, 224, 252],
                                   labels=['[0-9]', '(9-28]', '(28-56]', '(56-84]', '(84-112]', '(112-140]', '(140-168]', '(168-196]', '(196-224]', '(224-252]'])


# Categorize 'Fluid_Temperature_Filled' into bins
keyByTestID["FluidTempBin"] = pd.cut(keyByTestID["Fluid_Temperature_Filled"], bins=[-1, 20, 25, 100], labels=['Below 20', '20-25', 'Above 25'])

# Filtering successful tests
keyByTestID = keyByTestID[keyByTestID['ReturnCode'].isin(['Success','UnderReportableRange'])]

2.3. Merging time series datasets to add additional features¶

In [ ]:
# Merge dataset with keyByTestID and delete unmatched tests
keyByTestID['TestID'] = keyByTestID['TestID'].astype(int)
keyByTestID['System'] = keyByTestID['System'].astype(str)

A1_keyByTestID = keyByTestID[(keyByTestID['Sensor'] == 'Sensor A') & (keyByTestID['System'] == 'System 1')]
A1_Merged = pd.merge(A1_keyByTestID,A1_transposed,how='inner', on=['TestID'])
A1_transposed = A1_transposed[A1_transposed['TestID'].isin(A1_Merged['TestID'])]

A2_keyByTestID = keyByTestID.loc[(keyByTestID['Sensor'] == 'Sensor A') & (keyByTestID['System'] != 'System 1')]
A2_Merged = pd.merge(A2_keyByTestID,A2_transposed,how='inner', on=['TestID'])
A2_transposed = A2_transposed[A2_transposed['TestID'].isin(A2_Merged['TestID'])]

sensorA_System1 = sensorA_System1.loc[:, sensorA_System1.columns.isin(A1_Merged['TestID'].astype(str))]
sensorA_System2 = sensorA_System2.loc[:, sensorA_System2.columns.isin(A2_Merged['TestID'].astype(str))]


B1_keyByTestID = keyByTestID[(keyByTestID['Sensor'] == 'Sensor B') & (keyByTestID['System'] == 'System 1')]
B1_Merged = pd.merge(B1_keyByTestID,B1_transposed,how='inner', on=['TestID'])
B1_transposed = B1_transposed[B1_transposed['TestID'].isin(B1_Merged['TestID'])]

B2_keyByTestID = keyByTestID.loc[(keyByTestID['Sensor'] == 'Sensor B') & (keyByTestID['System'] != 'System 1')]
B2_Merged = pd.merge(B2_keyByTestID,B2_transposed,how='inner', on=['TestID'])
B1_transposed = B2_transposed[B2_transposed['TestID'].isin(A2_Merged['TestID'])]

sensorB_System1 = sensorB_System1.loc[:, sensorB_System1.columns.isin(B1_Merged['TestID'].astype(str))]
sensorB_System2 = sensorB_System2.loc[:, sensorB_System2.columns.isin(B2_Merged['TestID'].astype(str))]

print('A1: ', A1_Merged.shape)
print('A2: ', A2_Merged.shape)
print('B1: ', B1_Merged.shape)
print('B2: ', B2_Merged.shape)
A1:  (3382, 3380)
A2:  (7743, 3371)
B1:  (3375, 3380)
B2:  (7745, 3371)

2.3.1. Removal of testID only exists in one sensor¶

In [ ]:
# Note: Only run once. If not, restart the kernel and run from the beggining again.
A1_Merged = A1_Merged[A1_Merged["TestID"].isin(B1_Merged["TestID"])]
B1_Merged = B1_Merged[B1_Merged["TestID"].isin(A1_Merged["TestID"])]

A2_Merged = A2_Merged[A2_Merged["TestID"].isin(B2_Merged["TestID"])]
B2_Merged = B2_Merged[B2_Merged["TestID"].isin(A2_Merged["TestID"])]
print('A1: ', A1_Merged.shape)
print('A2: ', A2_Merged.shape)
print('B1: ', B1_Merged.shape)
print('B2: ', B2_Merged.shape)
A1:  (3374, 3380)
A2:  (7743, 3371)
B1:  (3374, 3380)
B2:  (7743, 3371)

3. Window extraction¶

In [ ]:
# Match window values of Sensor A and B for each test

# Sensor A
calDelimit = 11
cal_window_size = 8
sampleDelimit = 15
sample_window_size = 5

A1_cal_window, A1_sample_window = process_sensor_data(A1_Merged, calDelimit, cal_window_size, sampleDelimit, sample_window_size)
A2_cal_window, A2_sample_window = process_sensor_data(A2_Merged, calDelimit, cal_window_size, sampleDelimit, sample_window_size)


# sensor B
calDelimit = 20
cal_window_size = 18
sampleDelimit_blood = 24
sampleDelimit_aqueous = 30
sample_window_size = 4

B1_cal_window, B1_sample_window = process_sensor_data(B1_Merged, calDelimit, cal_window_size, sampleDelimit_blood, sample_window_size, sampleDelimit_aqueous)
B2_cal_window, B2_sample_window = process_sensor_data(B2_Merged, calDelimit, cal_window_size, sampleDelimit_blood, sample_window_size, sampleDelimit_aqueous)

3.1. Validating if there are partial or full missing values after the extraction¶

In [ ]:
A1_cal_window_drop_index = A1_cal_window.loc[A1_cal_window.isna().sum(axis=1)!=0].index
A2_cal_window_drop_index = A2_cal_window.loc[A2_cal_window.isna().sum(axis=1)!=0].index

A1_sample_window_drop_index = A1_sample_window.loc[A1_sample_window.isna().sum(axis=1)!=0].index
A2_sample_window_drop_index = A2_sample_window.loc[A2_sample_window.isna().sum(axis=1)!=0].index

B1_cal_window_drop_index = B1_cal_window.loc[B1_cal_window.isna().sum(axis=1)!=0].index
B2_cal_window_drop_index = B2_cal_window.loc[B2_cal_window.isna().sum(axis=1)!=0].index

B1_sample_window_drop_index = B1_sample_window.loc[B1_sample_window.isna().sum(axis=1)!=0].index
B2_sample_window_drop_index = B2_sample_window.loc[B2_sample_window.isna().sum(axis=1)!=0].index

# Check if missing values in different windows is different
print("The missing value in calibration window:",A1_cal_window_drop_index)
print("The missing value in sample window:",A1_sample_window_drop_index)
print("The missing value in calibration window:",A2_cal_window_drop_index)
print("The missing value in sample window:",A2_sample_window_drop_index)

print("The missing value in calibration window:",B1_cal_window_drop_index)
print("The missing value in sample window:",B1_sample_window_drop_index)
print("The missing value in calibration window:",B2_cal_window_drop_index)
print("The missing value in sample window:",B2_sample_window_drop_index)
The missing value in calibration window: Float64Index([], dtype='float64', name='TestID')
The missing value in sample window: Float64Index([], dtype='float64', name='TestID')
The missing value in calibration window: Int64Index([], dtype='int64', name='TestID')
The missing value in sample window: Int64Index([], dtype='int64', name='TestID')
The missing value in calibration window: Float64Index([], dtype='float64', name='TestID')
The missing value in sample window: Float64Index([], dtype='float64', name='TestID')
The missing value in calibration window: Float64Index([], dtype='float64', name='TestID')
The missing value in sample window: Float64Index([], dtype='float64', name='TestID')

3.2. Validating data shape post-window extraction¶

In [ ]:
# Set index for Merge datasets
A1_Merged.set_index("TestID", inplace=True)
A2_Merged.set_index("TestID", inplace=True)
B1_Merged.set_index("TestID", inplace=True)
B2_Merged.set_index("TestID", inplace=True)

# Find missing value
print("The problem indexes after extract the window are:",A1_Merged.index.difference(A1_cal_window.index))
print("The problem indexes after extract the window are:",A1_Merged.index.difference(A1_sample_window.index))
print("The problem indexes after extract the window are:",A2_Merged.index.difference(A2_cal_window.index))
print("The problem indexes after extract the window are:",A2_Merged.index.difference(A2_sample_window.index))

print("The problem indexes after extract the window are:",B1_Merged.index.difference(B1_cal_window.index))
print("The problem indexes after extract the window are:",B1_Merged.index.difference(B1_sample_window.index))
print("The problem indexes after extract the window are:",B2_Merged.index.difference(B2_cal_window.index))
print("The problem indexes after extract the window are:",B2_Merged.index.difference(B2_sample_window.index))

A1_Merged = A1_Merged.drop(A1_Merged.index.difference(A1_cal_window.index))
A1_Merged = A1_Merged.drop(A1_Merged.index.difference(A1_sample_window.index))
A2_Merged = A2_Merged.drop(A2_Merged.index.difference(A2_cal_window.index))
A2_Merged = A2_Merged.drop(A2_Merged.index.difference(A2_sample_window.index))

B1_Merged = B1_Merged.drop(B1_Merged.index.difference(B1_cal_window.index))
B1_Merged = B1_Merged.drop(B1_Merged.index.difference(B1_sample_window.index))
B2_Merged = B2_Merged.drop(B2_Merged.index.difference(B2_cal_window.index))
B2_Merged = B2_Merged.drop(B2_Merged.index.difference(B2_sample_window.index))

# Clear the Nan in index of sensor A
A1_cal_window = A1_cal_window[~A1_cal_window.index.isna()]
A1_sample_window = A1_sample_window[~A1_sample_window.index.isna()]
A2_cal_window = A2_cal_window[~A2_cal_window.index.isna()]
A2_sample_window = A2_sample_window[~A2_sample_window.index.isna()]

# Clear the Nan in index of sensor B
B1_cal_window = B1_cal_window[~B1_cal_window.index.isna()]
B1_sample_window = B1_sample_window[~B1_sample_window.index.isna()]
B2_cal_window = B2_cal_window[~B2_cal_window.index.isna()]
B2_sample_window = B2_sample_window[~B2_sample_window.index.isna()]
The problem indexes after extract the window are: Int64Index([12470355, 12470361, 12470365, 12537663, 12539049, 12622570], dtype='int64', name='TestID')
The problem indexes after extract the window are: Int64Index([12470355, 12470361, 12470365, 12537663, 12539049, 12622570], dtype='int64', name='TestID')
The problem indexes after extract the window are: Int64Index([], dtype='int64', name='TestID')
The problem indexes after extract the window are: Int64Index([], dtype='int64', name='TestID')
The problem indexes after extract the window are: Int64Index([12622570], dtype='int64', name='TestID')
The problem indexes after extract the window are: Int64Index([12622570], dtype='int64', name='TestID')
The problem indexes after extract the window are: Int64Index([3518677, 3518678], dtype='int64', name='TestID')
The problem indexes after extract the window are: Int64Index([3518677, 3518678], dtype='int64', name='TestID')
In [ ]:
# Shape of the subsets of time series after the extraction from the windows

# Cal Window
print('Shape of the time series after extraction')
print('A1_cal_window: ', A1_cal_window.shape)
print('A2_cal_window: ', A2_cal_window.shape)
print('B1_cal_window: ', B1_cal_window.shape)
print('B2_cal_window: ', B2_cal_window.shape)

# Sample Window
print('A1_sample_window: ', A1_sample_window.shape)
print('A2_sample_window: ', A2_sample_window.shape)
print('B1_sample_window: ', B1_sample_window.shape)
print('B2_sample_window: ', B2_sample_window.shape)

# We can delete the unmatch index but it is not necessary
Shape of the time series after extraction
A1_cal_window:  (3368, 41)
A2_cal_window:  (7743, 41)
B1_cal_window:  (3373, 91)
B2_cal_window:  (7741, 91)
A1_sample_window:  (3368, 26)
A2_sample_window:  (7743, 26)
B1_sample_window:  (3373, 21)
B2_sample_window:  (7741, 21)

3.3. Scaling the post-window data: zero-alignment¶

In [ ]:
# Cal Window

A1_cal_window_zero = align_to_zero(A1_cal_window)
A2_cal_window_zero = align_to_zero(A2_cal_window)
B1_cal_window_zero = align_to_zero(B1_cal_window)
B2_cal_window_zero = align_to_zero(B2_cal_window)


# Sample Window

A1_sample_window_zero = align_to_zero(A1_sample_window)
A2_sample_window_zero = align_to_zero(A2_sample_window)
B1_sample_window_zero = align_to_zero(B1_sample_window)
B2_sample_window_zero = align_to_zero(B2_sample_window)

3.4. Merging scaled data with additional attributes of interest¶

In [ ]:
# Combine data: Merge the zero-aligned time series with "FluidType", "AgeOfCardInDaysAtTimeOfTest", "Fluid_Temperature_Filled", "FluidTypeBin", "CardAgeBin", "FluidTempBin"
A1_cal_window_combine = Merge_data(A1_cal_window_zero,A1_Merged)
A2_cal_window_combine = Merge_data(A2_cal_window_zero,A2_Merged)

B1_cal_window_combine = Merge_data(B1_cal_window_zero,B1_Merged)
B2_cal_window_combine = Merge_data(B2_cal_window_zero,B2_Merged)

## Sample window
A1_sample_window_combine = Merge_data(A1_sample_window_zero,A1_Merged)
A2_sample_window_combine = Merge_data(A2_sample_window_zero,A2_Merged)

B1_sample_window_combine = Merge_data(B1_sample_window_zero,B1_Merged)
B2_sample_window_combine = Merge_data(B2_sample_window_zero,B2_Merged)

3.5. Balancing the specific attributes¶

In [ ]:
System1_Index, System2_Index =  balance_index(A1_cal_window_combine,A2_cal_window_combine,"CardAgeBin")
System1 Sensor A & B distribution:
 [0-9]        142
(9-28]       142
(28-56]      142
(56-84]      142
(84-112]     142
(112-140]    142
(140-168]    142
(168-196]    142
(196-224]    142
(224-252]    142
Name: CardAgeBin, dtype: int64

 System2 Sensor A & B distribution:
 [0-9]        142
(9-28]       142
(28-56]      142
(56-84]      142
(84-112]     142
(112-140]    142
(140-168]    142
(168-196]    142
(196-224]    142
(224-252]    142
Name: CardAgeBin, dtype: int64
In [ ]:
# Balanced data
A1_cal_window_combine_balanced = A1_cal_window_combine.loc[System1_Index]
A1_sample_window_combine_balanced = A1_sample_window_combine.loc[System1_Index]
A2_cal_window_combine_balanced = A2_cal_window_combine.loc[System2_Index]
A2_sample_window_combine_balanced = A2_sample_window_combine.loc[System2_Index]

B1_cal_window_combine_balanced = B1_cal_window_combine.loc[System1_Index]
B1_sample_window_combine_balanced = B1_sample_window_combine.loc[System1_Index]
B2_cal_window_combine_balanced = B2_cal_window_combine.loc[System2_Index]
B2_sample_window_combine_balanced = B2_sample_window_combine.loc[System2_Index]

3.6. Windows visualization¶

Fluid Temperature¶

System 1 and System 2: Sensor A - Cal and Sample Windows¶

In [ ]:
# Plot all the balanced time series from the window extraction
plot_all_time_series_in_group(A1_cal_window_combine_balanced, A1_sample_window_combine_balanced, A2_cal_window_combine_balanced, A2_sample_window_combine_balanced, "CardAgeBin", "System 1A - CalWindow", "System 1A - SampleWindow","System 2A - CalWindow", "System 2A - SampleWindow")

System 1 and System 2: Sensor B - Cal and Sample Windows¶

In [ ]:
# Plot all the balanced time series from the window extraction
plot_all_time_series_in_group(B1_cal_window_combine_balanced, B1_sample_window_combine_balanced, B2_cal_window_combine_balanced, B2_sample_window_combine_balanced, "CardAgeBin", "System 1B - CalWindow", "System 1B - SampleWindow","System 2B - CalWindow", "System 2B - SampleWindow")

4. FPCA characterization¶

4.1. Functional PC1 plots (both systems) and Characterization of FPC Scores¶

The following seccion will introduce

  1. Percentage of variance explain by the components.
  2. Time series with the major contribution on the components.
  3. Plot 1-2: All the waveforms and the mean function.
  4. Plot 3-4: First two components in different systems.
  5. Plot 5: First component (eigenfunction) of the two systems.
  6. Plot 6: The confidence interval of the mean first component computed using bootstrap.
  7. Plot 7-8: The boxplots of the generated samples of the first component. The boxplots show the different percentile about the first component.
    • Red dashed lines indicate detected outliers.
    • Red area shows the box region.
  8. Plot 9-12: Eigenvalues (scores) colored-mapping by attributes.

System 1 versus System 2: Sensor A - Cal Window¶

In [ ]:
pc_scores_s1_A_cal_window, pc_scores_s2_A_cal_window,fpca_s1_A_cal_window,fpca_s2_A_cal_window = fpca_two_inputs(A1_cal_window_combine_balanced.iloc[:,:-6], A2_cal_window_combine_balanced.iloc[:,:-6], color_fpc1_s1='tab:blue', color_fpc2_s1='tab:cyan', color_fpc1_s2='tab:orange', color_fpc2_s2='gold')
print("--------------------------------------------------- Bootstrap -------------------------------------------------------------------------------------------")
ac1, ac2 = bootstrap(A1_cal_window_combine_balanced, A2_cal_window_combine_balanced,"A","cal_window",features="CardAgeBin")
print("--------------------------------------------------- PCA Scores -------------------------------------------------------------------------------------------")
create_pc_scores_plots(pc_scores_s1_A_cal_window, pc_scores_s2_A_cal_window, A1_cal_window_combine_balanced, A2_cal_window_combine_balanced,features="CardAgeBin")
S1 Explain variance PC1 (%):  99.87217788257936
S1 Explain variance PC2 (%):  0.03137443830513182
S2 Explain variance PC1 (%):  99.93904177973562
S2 Explain variance PC2 (%):  0.022619270205938954
The time series contributing most to PC1 is at index 800 with TestID 12529762.0
The time series contributing most to PC2 is at index 82 with TestID 12615989.0
The time series contributing most to PC1 is at index 91 with TestID 3568638
The time series contributing most to PC2 is at index 19 with TestID 3559978
--------------------------------------------------- Bootstrap -------------------------------------------------------------------------------------------
Confidence Interval of 1st component
The number of sampling is 142
The boxplot of 1st Component
--------------------------------------------------- PCA Scores -------------------------------------------------------------------------------------------
Out[ ]:

System 1 versus System 2: Sensor A - Sample Window¶

In [ ]:
pc_scores_s1_A_sample_window, pc_scores_s2_A_sample_window,fpca_s1_A_sample_window,fpca_s2_A_sample_window = fpca_two_inputs(A1_sample_window_combine_balanced.iloc[:,:-6], A2_sample_window_combine_balanced.iloc[:,:-6], color_fpc1_s1='tab:blue', color_fpc2_s1='tab:cyan', color_fpc1_s2='tab:orange', color_fpc2_s2='gold')
print("--------------------------------------------------- Bootstrap -------------------------------------------------------------------------------------------")
as1,as2 = bootstrap(A1_sample_window_combine_balanced, A2_sample_window_combine_balanced,"A","sample_window",features="CardAgeBin")
print("--------------------------------------------------- PCA Scores -------------------------------------------------------------------------------------------")
create_pc_scores_plots(pc_scores_s1_A_sample_window, pc_scores_s2_A_sample_window, A1_sample_window_combine_balanced, A2_sample_window_combine_balanced,features="CardAgeBin")
S1 Explain variance PC1 (%):  99.54001643310664
S1 Explain variance PC2 (%):  0.13376186892582548
S2 Explain variance PC1 (%):  99.83602096130872
S2 Explain variance PC2 (%):  0.06238709532612137
The time series contributing most to PC1 is at index 800 with TestID 12529762.0
The time series contributing most to PC2 is at index 261 with TestID 12515884.0
The time series contributing most to PC1 is at index 140 with TestID 3568703
The time series contributing most to PC2 is at index 742 with TestID 3555912
--------------------------------------------------- Bootstrap -------------------------------------------------------------------------------------------
Confidence Interval of 1st component
The number of sampling is 142
The boxplot of 1st Component
--------------------------------------------------- PCA Scores -------------------------------------------------------------------------------------------
Out[ ]:

System 1 versus System 2: Sensor B - Cal Window¶

In [ ]:
pc_scores_s1_B_cal_window, pc_scores_s2_B_cal_window,fpca_s1_B_cal_window,fpca_s2_B_cal_window = fpca_two_inputs(B1_cal_window_combine_balanced.iloc[:,:-6], B2_cal_window_combine_balanced.iloc[:,:-6], color_fpc1_s1='tab:blue', color_fpc2_s1='tab:cyan', color_fpc1_s2='tab:orange', color_fpc2_s2='gold')
print("--------------------------------------------------- Bootstrap -------------------------------------------------------------------------------------------")
bc1,bc2 = bootstrap(B1_cal_window_combine_balanced, B2_cal_window_combine_balanced,"B","cal_window",features="CardAgeBin")
print("--------------------------------------------------- PCA Scores -------------------------------------------------------------------------------------------")
create_pc_scores_plots(pc_scores_s1_B_cal_window, pc_scores_s2_B_cal_window, B1_cal_window_combine_balanced, B2_cal_window_combine_balanced,features="CardAgeBin")
S1 Explain variance PC1 (%):  99.85065134319603
S1 Explain variance PC2 (%):  0.08925385168183501
S2 Explain variance PC1 (%):  99.87328890366544
S2 Explain variance PC2 (%):  0.09674564839625106
The time series contributing most to PC1 is at index 82 with TestID 12615989.0
The time series contributing most to PC2 is at index 664 with TestID 12371094.0
The time series contributing most to PC1 is at index 53 with TestID 3565690.0
The time series contributing most to PC2 is at index 53 with TestID 3565690.0
--------------------------------------------------- Bootstrap -------------------------------------------------------------------------------------------
Confidence Interval of 1st component
The number of sampling is 142
The boxplot of 1st Component
--------------------------------------------------- PCA Scores -------------------------------------------------------------------------------------------
Out[ ]:

System 1 versus System 2: Sensor B - Sample Window¶

In [ ]:
pc_scores_s1_B_sample_window, pc_scores_s2_B_sample_window,fpca_s1_B_sample_window,fpca_s2_B_sample_window = fpca_two_inputs(B1_sample_window_combine_balanced.iloc[:,:-6], B2_sample_window_combine_balanced.iloc[:,:-6], color_fpc1_s1='tab:blue', color_fpc2_s1='tab:cyan', color_fpc1_s2='tab:orange', color_fpc2_s2='gold')
print("--------------------------------------------------- Bootstrap -------------------------------------------------------------------------------------------")
bs1,bs2 = bootstrap(B1_sample_window_combine_balanced, B2_sample_window_combine_balanced, "B","sample_window",features="CardAgeBin")
print("--------------------------------------------------- PCA Scores -------------------------------------------------------------------------------------------")
create_pc_scores_plots(pc_scores_s1_B_sample_window, pc_scores_s2_B_sample_window, B1_sample_window_combine_balanced, B2_sample_window_combine_balanced,features="CardAgeBin")
S1 Explain variance PC1 (%):  99.79199457851556
S1 Explain variance PC2 (%):  0.05709964987162769
S2 Explain variance PC1 (%):  99.88982442479454
S2 Explain variance PC2 (%):  0.045673718167298656
The time series contributing most to PC1 is at index 684 with TestID 12191141.0
The time series contributing most to PC2 is at index 103 with TestID 12581955.0
The time series contributing most to PC1 is at index 666 with TestID 3518710.0
The time series contributing most to PC2 is at index 120 with TestID 3566587.0
--------------------------------------------------- Bootstrap -------------------------------------------------------------------------------------------
Confidence Interval of 1st component
The number of sampling is 142
The boxplot of 1st Component
--------------------------------------------------- PCA Scores -------------------------------------------------------------------------------------------
Out[ ]:

4.2 Linear Regression for slope¶

R-square and visualization¶

In [ ]:
df_list = []
def append_to_dataframe(window_name, slope1, slope2,se1,se2,n,p):
    """
    Append regression analysis results to a global dataframe list.
    
    """
    global df_list
    df_list.append({'Window': window_name, 'Slope 1': slope1, 'Slope 2': slope2,'Se 1': se1, 'Se 2': se2, "N": n,"p_value":p})
In [ ]:
append_to_dataframe('A_cal_window', *visualize_regression(fpca_s1_A_cal_window, fpca_s2_A_cal_window))
append_to_dataframe('A_sample_window', *visualize_regression(fpca_s1_A_sample_window, fpca_s2_A_sample_window))
append_to_dataframe('B_cal_window', *visualize_regression(fpca_s1_B_cal_window, fpca_s2_B_cal_window))
append_to_dataframe('B_sample_window', *visualize_regression(fpca_s1_B_sample_window, fpca_s2_B_sample_window))
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 7.930e+05
Date:                Sun, 23 Jun 2024   Prob (F-statistic):           1.09e-83
Time:                        22:40:34   Log-Likelihood:                 242.45
No. Observations:                  40   AIC:                            -480.9
Df Residuals:                      38   BIC:                            -477.5
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0058      0.000     30.975      0.000       0.005       0.006
x1            -0.0071   7.93e-06   -890.531      0.000      -0.007      -0.007
==============================================================================
Omnibus:                        3.406   Durbin-Watson:                   0.109
Prob(Omnibus):                  0.182   Jarque-Bera (JB):                3.120
Skew:                           0.618   Prob(JB):                        0.210
Kurtosis:                       2.415   Cond. No.                         48.0
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 8.806e+05
Date:                Sun, 23 Jun 2024   Prob (F-statistic):           1.49e-84
Time:                        22:40:34   Log-Likelihood:                 244.49
No. Observations:                  40   AIC:                            -485.0
Df Residuals:                      38   BIC:                            -481.6
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0060      0.000     33.869      0.000       0.006       0.006
x1            -0.0071   7.53e-06   -938.384      0.000      -0.007      -0.007
==============================================================================
Omnibus:                        5.735   Durbin-Watson:                   0.083
Prob(Omnibus):                  0.057   Jarque-Bera (JB):                3.128
Skew:                           0.462   Prob(JB):                        0.209
Kurtosis:                       1.988   Cond. No.                         48.0
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 5.446e+05
Date:                Sun, 23 Jun 2024   Prob (F-statistic):           8.16e-52
Time:                        22:40:34   Log-Likelihood:                 146.38
No. Observations:                  25   AIC:                            -288.8
Df Residuals:                      23   BIC:                            -286.3
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0157      0.000     52.697      0.000       0.015       0.016
x1            -0.0148      2e-05   -737.942      0.000      -0.015      -0.015
==============================================================================
Omnibus:                        0.529   Durbin-Watson:                   0.661
Prob(Omnibus):                  0.768   Jarque-Bera (JB):                0.461
Skew:                          -0.296   Prob(JB):                        0.794
Kurtosis:                       2.698   Cond. No.                         30.8
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 7.501e+05
Date:                Sun, 23 Jun 2024   Prob (F-statistic):           2.05e-53
Time:                        22:40:34   Log-Likelihood:                 150.62
No. Observations:                  25   AIC:                            -297.2
Df Residuals:                      23   BIC:                            -294.8
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0134      0.000     53.364      0.000       0.013       0.014
x1            -0.0147   1.69e-05   -866.059      0.000      -0.015      -0.015
==============================================================================
Omnibus:                        3.255   Durbin-Watson:                   0.195
Prob(Omnibus):                  0.196   Jarque-Bera (JB):                2.209
Skew:                           0.543   Prob(JB):                        0.331
Kurtosis:                       2.029   Cond. No.                         30.8
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 2.793e+05
Date:                Sun, 23 Jun 2024   Prob (F-statistic):          7.17e-156
Time:                        22:40:34   Log-Likelihood:                 499.94
No. Observations:                  90   AIC:                            -995.9
Df Residuals:                      88   BIC:                            -990.9
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.0001      0.000     -0.508      0.613      -0.001       0.000
x1             0.0020   3.84e-06    528.450      0.000       0.002       0.002
==============================================================================
Omnibus:                       13.947   Durbin-Watson:                   0.016
Prob(Omnibus):                  0.001   Jarque-Bera (JB):                8.997
Skew:                          -0.629   Prob(JB):                       0.0111
Kurtosis:                       2.097   Cond. No.                         106.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.999
Model:                            OLS   Adj. R-squared:                  0.999
Method:                 Least Squares   F-statistic:                 1.663e+05
Date:                Sun, 23 Jun 2024   Prob (F-statistic):          5.68e-146
Time:                        22:40:34   Log-Likelihood:                 477.27
No. Observations:                  90   AIC:                            -950.5
Df Residuals:                      88   BIC:                            -945.5
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0008      0.000      2.973      0.004       0.000       0.001
x1             0.0020   4.94e-06    407.824      0.000       0.002       0.002
==============================================================================
Omnibus:                       11.543   Durbin-Watson:                   0.009
Prob(Omnibus):                  0.003   Jarque-Bera (JB):                9.163
Skew:                          -0.675   Prob(JB):                       0.0102
Kurtosis:                       2.211   Cond. No.                         106.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.999
Model:                            OLS   Adj. R-squared:                  0.999
Method:                 Least Squares   F-statistic:                 1.747e+04
Date:                Sun, 23 Jun 2024   Prob (F-statistic):           2.40e-28
Time:                        22:40:34   Log-Likelihood:                 83.320
No. Observations:                  20   AIC:                            -162.6
Df Residuals:                      18   BIC:                            -160.6
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0128      0.002      6.937      0.000       0.009       0.017
x1            -0.0203      0.000   -132.177      0.000      -0.021      -0.020
==============================================================================
Omnibus:                        2.319   Durbin-Watson:                   0.138
Prob(Omnibus):                  0.314   Jarque-Bera (JB):                1.832
Skew:                           0.609   Prob(JB):                        0.400
Kurtosis:                       2.154   Cond. No.                         25.0
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.999
Model:                            OLS   Adj. R-squared:                  0.999
Method:                 Least Squares   F-statistic:                 1.861e+04
Date:                Sun, 23 Jun 2024   Prob (F-statistic):           1.36e-28
Time:                        22:40:34   Log-Likelihood:                 83.924
No. Observations:                  20   AIC:                            -163.8
Df Residuals:                      18   BIC:                            -161.9
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0131      0.002      7.367      0.000       0.009       0.017
x1            -0.0203      0.000   -136.433      0.000      -0.021      -0.020
==============================================================================
Omnibus:                        2.598   Durbin-Watson:                   0.133
Prob(Omnibus):                  0.273   Jarque-Bera (JB):                1.958
Skew:                           0.617   Prob(JB):                        0.376
Kurtosis:                       2.092   Cond. No.                         25.0
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Slopes Results Comparison for one sample¶

In [ ]:
slopes_df = pd.DataFrame(df_list)
slopes_df
Out[ ]:
Window Slope 1 Slope 2 Se 1 Se 2 N p_value
0 A_cal_window -0.007061 -0.007069 0.000008 0.000008 40 0.44
1 A_sample_window -0.014793 -0.014652 0.000020 0.000017 25 0.00
2 B_cal_window 0.002030 0.002015 0.000004 0.000005 90 0.02
3 B_sample_window -0.020282 -0.020312 0.000153 0.000149 20 0.89

5. Functional Regression¶

This is another functional Data Analysis method. Unlike FPCA, the following analysis utilizes the entire time series in a balanced and centered dataset as response variables for regression with the features grouped by bins. This is done to distinguish between two systems under the influence of features.

5.1. Regression coefficients¶

These are the coefficients from the output of the model.

  • Note: In the visualizations, due to the different magnitude, we need to choose the time stamps before we visualize.

Sensor A¶

Cal window¶

In [ ]:
print("System 1:")
A1_cal_window_funct_reg = Function_regression(A1_cal_window_combine_balanced,40,['AgeOfCardInDaysAtTimeOfTest'])
print("----------------------------------------------------------------------------")
print("\n","System 2:")
A2_cal_window_funct_reg = Function_regression(A2_cal_window_combine_balanced,40,['AgeOfCardInDaysAtTimeOfTest'])
System 1:
Model Summary: 

Intercept: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 39.0),), n_basis=41, period=39.0),
    coefficients=[[ 5.44831087e-01 -2.48517680e-01 -3.73647104e-02 -1.03891504e-01
      -2.58439000e-02 -5.99697051e-02 -2.65371333e-03 -7.00853023e-02
       9.57908317e-03 -5.51903945e-02  8.00488064e-03 -8.65443174e-02
      -8.70556984e-03 -3.42938125e-02 -3.66511518e-02 -3.07548887e-02
       4.96307464e-04 -5.53743332e-02 -4.09042855e-02 -2.42214352e-02
      -8.89843291e-03 -4.75685312e-02  4.60845094e-03 -2.92784615e-02
      -1.39147659e-02 -3.74305179e-02 -4.76644256e-02 -3.03830126e-02
      -2.91756320e-03 -3.78845803e-02  6.71415720e-03 -9.21303261e-02
      -1.40742815e-02 -1.01446250e-01 -8.15209354e-03 -1.64676788e-01
      -8.51399471e-03 -2.53666543e+13  2.99952283e+11 -2.53666543e+13
      -2.99952283e+11]]) 

Coefficient of AgeOfCardInDaysAtTimeOfTest: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 39.0),), n_basis=41, period=39.0),
    coefficients=[[ 1.18738419e-02 -5.55202998e-03 -7.12022128e-04 -2.01225702e-03
      -6.61002740e-04 -1.27536169e-03  1.30976241e-04 -9.66391188e-04
       3.46259348e-04 -1.14881492e-03  1.55435324e-04 -8.70642605e-04
      -1.69066303e-04 -8.00293848e-04  2.84906358e-05 -6.45470177e-04
      -8.68556409e-05 -4.06860407e-04  3.85114633e-04 -9.96990616e-04
       2.53289787e-04 -8.67523746e-04  1.00469545e-04 -8.74869444e-04
      -8.45185813e-05 -6.43713834e-04  5.97328159e-05 -9.55442707e-04
       3.00038413e-05 -1.11627423e-03  5.11032860e-04 -1.56760635e-03
       4.87036834e-04 -2.56032343e-03 -1.89085071e-04 -3.50338940e-03
      -1.85204082e-04 -5.35453548e+11  6.33156082e+09 -5.35453548e+11
      -6.33156082e+09]]) 

----------------------------------------------------------------------------

 System 2:
Model Summary: 

Intercept: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 39.0),), n_basis=41, period=39.0),
    coefficients=[[ 4.72310867e-01 -2.37365658e-01 -2.29396697e-02 -9.99958519e-02
      -2.76898793e-02 -4.65937857e-02  4.26908362e-03 -5.33222306e-02
       2.80346162e-02 -6.79134955e-02  2.86997462e-02 -6.85677626e-02
       3.11028938e-03 -5.02731685e-02  2.72785356e-02  2.79043366e-03
       2.78878282e-02  2.95297674e-04  1.17621366e-02 -2.18320684e-02
       3.98484811e-02 -8.32573466e-02  2.20214534e-02 -2.61056086e-02
      -4.82826156e-03  8.78993517e-03 -1.82064246e-02  8.84172669e-03
       1.45998829e-02 -5.31498017e-02  1.19681365e-02 -6.52269710e-02
       1.31741005e-02 -1.39051936e-01 -1.39764898e-02 -1.33792448e-01
      -5.72656980e-02 -2.31724916e+13  2.74007036e+11 -2.31724916e+13
      -2.74007036e+11]]) 

Coefficient of AgeOfCardInDaysAtTimeOfTest: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 39.0),), n_basis=41, period=39.0),
    coefficients=[[ 1.47748435e-02 -6.88741729e-03 -8.12301186e-04 -2.47091351e-03
      -7.38218480e-04 -1.68662050e-03  1.52403969e-04 -1.38267931e-03
       3.07366183e-04 -1.24752567e-03  6.91503697e-05 -1.29602409e-03
      -3.20771930e-04 -9.57715979e-04 -5.09968479e-04 -1.08670242e-03
      -3.51221107e-04 -9.35432749e-04 -6.53541798e-05 -1.20088600e-03
      -4.82059248e-05 -8.09957732e-04 -3.22529885e-05 -1.10443840e-03
      -2.42517985e-04 -1.23124520e-03 -1.89463087e-04 -1.36358304e-03
      -9.49074684e-05 -1.28854051e-03  4.94784472e-04 -2.05294459e-03
       3.46129739e-04 -2.88283496e-03 -2.23840909e-04 -4.56072005e-03
       1.50559251e-04 -6.72856470e+11  7.95630485e+09 -6.72856470e+11
      -7.95630485e+09]]) 

Sample window¶

In [ ]:
print("System 1:")
A1_sample_window_funct_reg = Function_regression(A1_sample_window_combine_balanced,25,["AgeOfCardInDaysAtTimeOfTest"])
print("----------------------------------------------------------------------------")
print("\n","System 2:")
A2_sample_window_funct_reg = Function_regression(A2_sample_window_combine_balanced,25,["AgeOfCardInDaysAtTimeOfTest"])
System 1:
Model Summary: 

Intercept: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 24.0),), n_basis=25, period=24.0),
    coefficients=[[-3.47405267e-01  9.45639274e-02  6.96805443e-02 -3.41192074e-02
       1.35924638e-02 -7.74780484e-02 -4.10167328e-02 -1.16125792e-02
      -9.53707165e-02 -7.81899937e-02 -2.43494384e-01  2.55393357e-01
      -8.78382363e-02 -1.03298321e-02  1.69542022e-02  1.00396395e-01
       9.25932670e-03  9.91599491e-02  5.82284577e-02  8.11088770e-02
       1.22798698e-01  2.13497360e-01  3.75288394e-02  1.19839537e+14
       2.30508497e-01]]) 

Coefficient of AgeOfCardInDaysAtTimeOfTest: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 24.0),), n_basis=25, period=24.0),
    coefficients=[[ 3.83486978e-03 -1.28794880e-03 -7.82546285e-04  1.31167284e-04
      -1.02832165e-04  9.68728150e-04  5.59264870e-04 -8.66213788e-05
       1.10147068e-03  1.00261966e-03  3.26236614e-03 -3.66247700e-03
       1.07275973e-03 -1.58717448e-04 -4.67277915e-04 -1.49344973e-03
      -5.33692231e-06 -1.39933620e-03 -6.89587321e-04 -1.52795476e-03
      -1.40133434e-03 -2.67820855e-03 -7.79961720e-04 -1.69909044e+12
      -3.25960820e-03]]) 

----------------------------------------------------------------------------

 System 2:
Model Summary: 

Intercept: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 24.0),), n_basis=25, period=24.0),
    coefficients=[[ 2.27267912e-01 -9.49063655e-02 -2.51954581e-02 -9.26852018e-03
      -1.23526635e-02 -1.52229697e-02  6.31371005e-02 -1.73891287e-01
       2.65736930e-02  4.31894900e-03  2.27511466e-02 -1.69114550e-01
       1.17173387e-01 -1.76855750e-01 -4.07989176e-02 -4.27431202e-02
      -8.38436819e-02 -3.42118956e-02 -9.75115518e-02 -1.55541589e-01
      -6.37413492e-02 -1.55948522e-01  1.46022119e-01 -8.00315331e+13
      -1.22042466e-01]]) 

Coefficient of AgeOfCardInDaysAtTimeOfTest: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 24.0),), n_basis=25, period=24.0),
    coefficients=[[ 5.19883123e-03 -1.48621696e-03 -1.60281428e-03 -3.85195201e-04
      -1.73855531e-04  1.67866379e-03  3.08372260e-04  9.96956359e-04
       1.52496540e-03  1.70474957e-03  5.49096855e-03 -6.06508302e-03
       6.25321914e-04  2.70277003e-04 -1.12044687e-03 -2.93449431e-03
       1.27256199e-03 -2.33846456e-03 -1.22490320e-04 -2.69124677e-03
      -1.57259327e-03 -3.26346222e-03 -3.22476788e-03 -2.59648512e+12
      -5.20346777e-03]]) 

Sensor B¶

Cal window¶

In [ ]:
print("System 1:")
B1_cal_window_funct_reg = Function_regression(B1_cal_window_combine_balanced,90,["AgeOfCardInDaysAtTimeOfTest"])
print("----------------------------------------------------------------------------")
print("\n","System 2:")
B2_cal_window_funct_reg = Function_regression(B2_cal_window_combine_balanced,90,["AgeOfCardInDaysAtTimeOfTest"])
System 1:
Model Summary: 

Intercept: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 89.0),), n_basis=91, period=89.0),
    coefficients=[[ 1.41944188e+01 -6.18075643e+00 -1.30919887e+00 -2.39368608e+00
      -1.83134625e-01 -2.36676515e+00  3.97693796e-01 -1.41883728e+00
      -5.40517823e-01 -4.95265371e-01 -4.54411238e-01 -3.50507286e-01
       1.08951474e+00 -7.64794470e-01  6.79133545e-01 -8.38301554e-01
       6.22457016e-02 -5.87978400e-01  4.96112543e-01 -8.72747297e-01
       1.05314002e+00 -1.32193016e+00  6.16687582e-01 -1.60670875e+00
       7.05988157e-02 -1.82184111e+00  9.95172623e-02 -1.30109812e+00
      -5.05472751e-01 -1.13295633e+00 -5.26193256e-01 -9.51463359e-01
      -9.00683786e-01 -8.53833431e-01 -3.43621093e-01 -1.19133676e+00
      -1.02012295e+00  2.68934625e-01 -1.66048368e+00  4.30706708e-01
      -4.55529504e-01 -4.28099111e-01 -1.92364062e-01 -5.01679667e-01
      -7.34823350e-01 -9.58270947e-02 -5.64465162e-01 -9.19193356e-02
      -5.62647501e-01  4.29272744e-01 -7.63253088e-01  8.70708624e-01
      -6.69852477e-01  4.53599282e-01  7.00751391e-02  3.40961395e-01
       2.66664619e-01  2.16600466e-01  4.13542503e-01  1.40006634e-01
       2.69372763e-01 -5.17469534e-01  2.13189244e-01 -8.36191263e-02
       6.51723684e-01 -3.10098693e-01  4.86352995e-01 -3.90751186e-01
       2.03664818e-02 -8.39404382e-01 -3.88608894e-02 -7.22326990e-01
       4.32829163e-01 -6.49650857e-01  4.71450017e-01 -6.61520497e-01
       1.86597075e-01 -1.86009072e+00 -2.73574183e-01 -1.42965988e+00
      -5.07028944e-01 -2.38608382e-01 -6.04082448e-01 -1.28647030e+00
      -3.43943548e-01 -2.28631365e+00  1.43353149e-01 -4.88518865e+14
       4.14401855e+11 -4.88518865e+14 -4.14401855e+11]]) 

Coefficient of AgeOfCardInDaysAtTimeOfTest: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 89.0),), n_basis=91, period=89.0),
    coefficients=[[ 2.77308778e-03 -3.02275603e-04  1.13051320e-04 -4.07613079e-05
      -1.83789224e-04  6.55023869e-04  4.78400511e-04 -7.73109568e-04
      -1.68673965e-04  3.44093887e-04 -7.41888422e-04 -1.31276533e-03
       2.02504134e-04 -1.59577018e-03  3.23105442e-04 -2.06541984e-04
      -5.44847302e-04 -1.24398518e-03  1.36200440e-04 -1.92718742e-03
       6.19120804e-05 -4.25941449e-04 -7.14207934e-04  1.38951279e-03
       7.43407211e-04  1.44489427e-04  6.43915403e-04 -1.21858565e-03
       1.38392509e-04 -6.86304157e-04 -6.51953881e-04  6.96336843e-04
      -1.03930699e-04 -6.23372649e-04 -1.63625061e-04 -9.81770329e-04
      -4.27855658e-04 -1.54221357e-03  6.06777390e-04 -3.17103293e-04
       5.58214446e-04 -6.74405462e-05 -4.07873680e-05 -4.45098304e-04
      -7.48285019e-04  5.16660497e-04 -1.01901838e-03  4.29369174e-05
      -7.59435042e-04 -1.29698468e-03  2.82353057e-04  2.34173143e-04
      -6.38391899e-05  4.20394202e-04  3.49486815e-04 -1.07113687e-04
      -1.25112525e-04  1.15613800e-03 -1.17631525e-05 -2.57567907e-04
       7.45550452e-04  1.04591972e-03  6.70687354e-04 -4.19236563e-04
      -4.60616753e-04 -8.89591011e-04  3.40731861e-04 -1.79128094e-04
       7.13527816e-04 -1.77897435e-05  9.91017163e-04  2.46021428e-04
      -7.41851710e-04 -2.01694303e-04 -3.78495391e-04 -1.82882975e-03
      -6.56182562e-04  1.15475275e-03 -8.57988004e-04  7.61039968e-04
      -7.86394694e-04 -3.98487928e-04  6.13705975e-04 -1.06347249e-03
       7.79397035e-04 -1.35696275e-03  4.05315476e-04 -9.07723879e+10
       7.70006004e+07 -9.07723879e+10 -7.70006004e+07]]) 

----------------------------------------------------------------------------

 System 2:
Model Summary: 

Intercept: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 89.0),), n_basis=91, period=89.0),
    coefficients=[[ 1.71568135e+01 -7.12105429e+00 -1.55871797e+00 -2.79740396e+00
      -2.15912051e-01 -2.80706127e+00  5.42959608e-01 -1.84733622e+00
      -5.98957210e-01 -6.63148840e-01 -7.37373742e-01 -6.12628835e-01
       1.23517903e+00 -1.13236325e+00  8.15547052e-01 -8.89289354e-01
       1.62516803e-01 -6.42451128e-01  5.01874847e-01 -1.16529209e+00
       1.34854437e+00 -1.43012546e+00  6.69080449e-01 -1.75753214e+00
       1.57711185e-01 -2.09626737e+00  2.11458691e-01 -1.61611372e+00
      -4.03767548e-01 -1.56894793e+00 -8.14529808e-01 -1.16193989e+00
      -1.07491135e+00 -1.18209198e+00 -3.77325897e-01 -1.52136322e+00
      -1.38606618e+00 -1.59806700e-01 -1.88046769e+00  4.11117985e-01
      -4.20426161e-01 -4.73961643e-01 -2.61184762e-01 -6.35294867e-01
      -9.11369931e-01  1.03399251e-01 -8.83326887e-01 -6.71962007e-02
      -7.99699171e-01  3.86520671e-01 -9.24469078e-01  1.16517878e+00
      -7.84055618e-01  6.83272312e-01  2.17260909e-01  3.95777948e-01
       2.46851758e-01  5.64912734e-01  5.28764385e-01  1.81467775e-01
       3.06654834e-01 -4.20326101e-01  2.84649025e-01  1.42305409e-02
       6.20488061e-01 -3.17726804e-01  5.45232688e-01 -5.64309285e-01
       1.50569772e-01 -1.09371131e+00  1.73983824e-01 -8.13898495e-01
       4.36953860e-01 -9.39762811e-01  3.93244656e-01 -1.22890210e+00
      -2.57587822e-02 -2.23930604e+00 -4.19982626e-01 -1.39309666e+00
      -6.12658396e-01 -3.46651093e-01 -5.66124789e-01 -1.49707973e+00
      -3.02226452e-01 -2.90835120e+00  3.20798153e-01 -5.88305340e+14
       4.99048945e+11 -5.88305340e+14 -4.99048945e+11]]) 

Coefficient of AgeOfCardInDaysAtTimeOfTest: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 89.0),), n_basis=91, period=89.0),
    coefficients=[[-3.19169018e-04 -1.17826466e-03  1.20249396e-04  2.20942726e-04
      -2.26451277e-04  1.04685986e-03 -5.05308549e-06  5.20791011e-04
      -5.32174834e-04  8.14911722e-04  5.93846886e-04  6.71645274e-06
       4.08971977e-04 -1.98491439e-04  3.91884955e-04 -8.05234342e-04
      -1.43785817e-03 -1.93489640e-03  6.44847207e-04 -9.10974402e-04
      -6.69561678e-04 -1.78917124e-03 -2.99048740e-04  5.21031945e-04
       4.28242059e-04  1.46643074e-04 -2.09481053e-04 -6.96994361e-04
      -1.16849198e-03  9.84824280e-04  5.17550617e-04  1.00392962e-03
      -1.98732189e-04  8.19320004e-04 -3.59142983e-04  1.81618516e-04
       1.31458903e-03  1.68764939e-03  2.51197033e-04  2.87032217e-04
      -8.21393697e-05 -5.07134449e-04  4.02276678e-04 -3.49579255e-04
      -6.04589628e-04 -1.17088026e-03  2.74599394e-04 -1.03717045e-04
       1.08480416e-04 -3.38030455e-04  5.52205361e-04 -8.29760457e-04
      -1.05675894e-04 -3.09141357e-04 -5.87936673e-04 -1.01212529e-04
       3.26107231e-04 -9.58560086e-04 -1.50549070e-04 -2.56941177e-04
       9.68344389e-04 -1.85509372e-04  4.27845169e-04 -7.88548511e-04
       5.79180113e-04 -1.09217207e-03  5.92050633e-04  7.78146109e-04
      -8.93308793e-05  6.44044159e-04 -7.88796076e-04 -3.95799553e-05
      -4.26592200e-04  1.48979573e-03  6.14249288e-04  1.50519927e-03
       8.28337352e-04  1.96166939e-03 -3.48373980e-04 -1.79282175e-03
      -9.20258848e-04  3.71348472e-05 -2.55227829e-04 -1.37972657e-03
      -1.24669112e-04  2.45679192e-04 -2.72785877e-04  3.33713572e+09
      -2.83083281e+06  3.33713572e+09  2.83083281e+06]]) 

Sample window¶

In [ ]:
print("System 1:")
B1_sample_window_funct_reg = Function_regression(B1_sample_window_combine_balanced,20,["AgeOfCardInDaysAtTimeOfTest"])
print("----------------------------------------------------------------------------")
print("\n","System 2:")
B2_sample_window_funct_reg = Function_regression(B2_sample_window_combine_balanced,20,["AgeOfCardInDaysAtTimeOfTest"])
System 1:
Model Summary: 

Intercept: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 19.0),), n_basis=21, period=19.0),
    coefficients=[[ 1.94477262e+00 -5.74698778e-01  4.94399705e-01 -1.88917790e-01
       4.59279146e-01 -4.46228280e-01  4.80121074e-01 -3.09173927e-01
       2.84898091e-01 -4.26428562e-01  4.60382637e-01 -1.77049198e-01
       4.30580453e-01 -2.28338562e-01  2.39491499e-01 -3.79670013e-01
       3.09234260e-01 -4.73380784e+14  2.33768288e+12 -4.73380784e+14
      -2.33768288e+12]]) 

Coefficient of AgeOfCardInDaysAtTimeOfTest: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 19.0),), n_basis=21, period=19.0),
    coefficients=[[-7.09581020e-04 -8.68061993e-04 -3.74666503e-04 -8.59587528e-04
      -3.89071731e-04  6.07273996e-04 -5.45706512e-04  6.85532296e-04
       2.66439598e-04  7.82302116e-04 -1.00400795e-03 -1.08968462e-03
      -6.22492728e-04 -9.25278571e-05  6.79326947e-04 -1.04001312e-03
       6.94320739e-04  1.28710581e+11 -6.35607808e+08  1.28710581e+11
       6.35607808e+08]]) 

----------------------------------------------------------------------------

 System 2:
Model Summary: 

Intercept: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 19.0),), n_basis=21, period=19.0),
    coefficients=[[ 2.35854382e+00 -1.05727397e+00  5.17485389e-01 -3.08207946e-01
       4.18197900e-01 -3.80737342e-01  4.77544884e-01 -1.89795445e-01
       3.25098513e-01 -4.29825818e-01  3.26121332e-01 -2.56410164e-01
       4.09545358e-01 -2.97515899e-01  2.90183483e-01 -7.19536575e-01
       3.39351987e-01 -5.67897786e+14  2.80443351e+12 -5.67897786e+14
      -2.80443351e+12]]) 

Coefficient of AgeOfCardInDaysAtTimeOfTest: FDataBasis(
    _basis=FourierBasis(domain_range=((0.0, 19.0),), n_basis=21, period=19.0),
    coefficients=[[-8.82368361e-04  1.58050863e-03  1.69679372e-04 -9.60658336e-04
       6.18075451e-04 -3.04489263e-04 -4.07719307e-04 -2.19890657e-04
       4.60177294e-04  3.13117904e-04  7.77356798e-04 -5.96044021e-04
       5.66543695e-04 -4.32255887e-04  9.78368234e-04  8.73780471e-04
       9.06883416e-04  7.67085169e+10 -3.78807491e+08  7.67085169e+10
       3.78807491e+08]]) 

5.2. Coefficients visualization¶

Sensor A¶

Cal window¶

In [ ]:
coefficent_visualization(A1_cal_window_funct_reg,A2_cal_window_funct_reg,["AgeOfCardInDaysAtTimeOfTest"],range(1,36),"SensorA Cal window")

Sample window¶

In [ ]:
coefficent_visualization(A1_sample_window_funct_reg,A2_sample_window_funct_reg,["AgeOfCardInDaysAtTimeOfTest"],range(1,23),"SensorA sample window")

Sensor B¶

Cal window¶

In [ ]:
coefficent_visualization(B1_cal_window_funct_reg,B2_cal_window_funct_reg,["AgeOfCardInDaysAtTimeOfTest"],range(1,86),"SensorB Cal window")

Sample window¶

In [ ]:
coefficent_visualization(B1_sample_window_funct_reg, B2_sample_window_funct_reg, ["AgeOfCardInDaysAtTimeOfTest"], range(1, 16), "SensorB Sample window")